241 research outputs found

    Finding Associations and Computing Similarity via Biased Pair Sampling

    Full text link
    This version is ***superseded*** by a full version that can be found at http://www.itu.dk/people/pagh/papers/mining-jour.pdf, which contains stronger theoretical results and fixes a mistake in the reporting of experiments. Abstract: Sampling-based methods have previously been proposed for the problem of finding interesting associations in data, even for low-support items. While these methods do not guarantee precise results, they can be vastly more efficient than approaches that rely on exact counting. However, for many similarity measures no such methods have been known. In this paper we show how a wide variety of measures can be supported by a simple biased sampling method. The method also extends to find high-confidence association rules. We demonstrate theoretically that our method is superior to exact methods when the threshold for "interesting similarity/confidence" is above the average pairwise similarity/confidence, and the average support is not too low. Our method is particularly good when transactions contain many items. We confirm in experiments on standard association mining benchmarks that this gives a significant speedup on real data sets (sometimes much larger than the theoretical guarantees). Reductions in computation time of over an order of magnitude, and significant savings in space, are observed.Comment: This is an extended version of a paper that appeared at the IEEE International Conference on Data Mining, 2009. The conference version is (c) 2009 IEE

    Data collection framework for understanding UFT within city logistics solutions

    Get PDF
    Urban Freight Transport (UFT) is a fundamental component of city life. It involves a vast range of activities resulting from relationships among different actors with conflicting needs and goals. Manufacturers are interested in fast and on-time deliveries, retailers require complete assortment and frequent deliveries, citizens wish to have easy access to goods while not losing their quality of life and City Authorities have to face negative externalities related to UFT (i.e. congestion, air and noise pollution, and safety). Concretely, few cities have a well-developed and comprehensive city logistics strategy because authorities generally focus their attention on passenger transport. When city logistics measures have been conceived and implemented, frequently private requirements have not been considered sufficiently. The European Commission includes the lack of data and understanding of freight flows among the main obstacles to the improvement of operational efficiency and planning process for a sustainable UFT in economic, social and environmental terms. Also, the research community raises the issue of the unavailability or the low quality of data on urban freight and the need to identify effective data collection methods in order to understand processes and actors' behavior and then define appropriate city logistics solutions. The NOVELOG EU project is providing city authorities and practitioners with a new framework aimed at systematizing all data to be collected, directly or indirectly, and to be elaborated in order to understand and represent the different aspects of the UFT sector. In order to achieve a complete knowledge, the framework approaches this sector according to four main thematic pillars: 1) profile of major supply chains served in the urban area under study; 2) mapping of urban freight and service trips activity; 3) organizational and legal framework; 4) procedural and technological methods and innovations. The present paper introduces the framework and the guidance it provides to its target audience

    On Finding Frequent Patterns in Event Sequences

    Full text link
    Given a directed acyclic graph with labeled vertices, we consider the problem of finding the most common label sequences ("traces") among all paths in the graph (of some maximum length m). Since the number of paths can be huge, we propose novel algorithms whose time complexity depends only on the size of the graph, and on the frequency epsilon of the most frequent traces. In addition, we apply techniques from streaming algorithms to achieve space usage that depends only on epsilon, and not on the number of distinct traces. The abstract problem considered models a variety of tasks concerning finding frequent patterns in event sequences. Our motivation comes from working with a data set of 2 million RFID readings from baggage trolleys at Copenhagen Airport. The question of finding frequent passenger movement patterns is mapped to the above problem. We report on experimental findings for this data set.Comment: Appears in proceedings of ICDM '10: The 10th IEEE International Conference on Data Mining. Publisher: IEE

    Grouping complex systems: a weighted network comparative analysis

    Get PDF
    In this study, the authors compare two inter-municipal commuting networks (MCN) pertaining to the Italian islands of Sardinia and Sicily, by approaching their characterization through a weighted network analysis. They develop on the results obtained for the MCN of Sardinia (De Montis et al. 2007) and attempt to use network analysis as a mean of detection of similarities or dissimilarities between the systems at hand

    Modeling commuting systems through a complex network analysis: a study of the Italian islands of Sardinia and Sicily

    Get PDF
    This study analyzes the inter-municipal commuting systems of the Italian islands of Sardinia and Sicily, employing weighted network analysis technique. Based on the results obtained for the Sardinian commuting network, the network analysis is used to identify similarities and dissimilarities between the two systems

    Emergent topological and dynamical properties of a real inter-municipal commuting network - perspectives for policy-making and planning

    Get PDF
    A variety of phenomena can be explained by means of a description of the features of their underlying network structure. In addition, a large number of scientists (see the reviews, eg. Barabasi, 2002; Watts, 2003) demonstrated the emergence of large-scale properties common to many different systems. These various results and studies led to what can be termed as the “new science of complex networks” and to emergence of the new “age of connectivity”. In the realms of urban and environmental planning, spatial analysis and regional science, many scientists have shown in the past years an increasing interest for the research developments on complex networks. Their studies range from theoretical statements on the need to apply complex network analysis to spatial phenomena (Salingaros, 2001) to empirical studies on quantitative research about urban space syntax (Jiang and Claramunt, 2004). Concerning transportation systems analysis, interesting results have been recently obtained on subway networks (Latora and Marchiori, 2002; Gastner and Newman, 2004) and airports (Barrat et al, 2004). In this paper, we study the inter-municipal commuting network of Sardinia (Italy). In this complex weighted network, the nodes correspond to urban centres while the weight of the links between two municipalities represents the flow of individuals between them. Following the analysis developed by Barrat et al. (2004), we investigate the topological and dynamical properties of this complex weighted network. The topology of this network can be accurately described by a regular small-world network while the traffic structure is very rich and reveals highly complex traffic patterns. Finally, in the perspective of policy-making and planning, we compare the emerging network behaviors with the geographical, social and demographical aspects of the transportation system.

    Fatores prognĂłsticos influenciando a mortalidade em esofagectomia

    Get PDF
    OBJETIVO: Nos Ășltimos 15 anos, melhorias tĂ©cnicas contribuĂ­ram para a redução da taxa de mortalidade pĂłs-operatĂłria de 29 para 8 %. O objetivo deste estudo Ă© analisar retrospectivamente o papel de diferentes fatores na mortalidade pĂłs-operatĂłria de 63 pacientes submetidos a esofagectomia para tratamento de cĂąncer. MÉTODOS: Sessenta e trĂȘs pacientes foram submetidos a esofagectomia com utilização do estĂŽmago como substituto. Os procedimentos cirĂșrgicos incluĂ­ram esofagectomia transtorĂĄcica em 49 pacientes e esofagectomia trans-hiatal em 14 casos. Entre os 49 pacientes de esofagectomia transtorĂĄcica haviam 18 (37%) com risco anestĂ©sico elevado (ASA ;&sup3;; III). Quatorze pacientes foram submetidos a esofagectomia trans-hiatal. RESULTADOS: A mortalidade operatĂłria foi de 14% na esofagectomia trans-hiatal e 22% na esofagectomia transtorĂĄcica (p = ns). A mortalidade dos pacientes com risco anestĂ©sico elevado foi de 47 % apĂłs esofagectomia transtorĂĄcica e 10% apĂłs esofagectomia trans-hiatal (p < 0,05). DISCUSSÃO: Em nossa experiĂȘncia, a mortalidade foi de quase 18% e 22% apĂłs esofagectomia transtorĂĄcica. Entre os pacientes com risco anestĂ©sico elevado que se submeteram Ă  operação, a mortalidade pĂłs-operatĂłria foi significativamente mais baixa apĂłs a esofagectomia trans-hiatal (10%) comparativamente Ă  esofagectomia transtorĂĄcica (47 %) (p< 0,05).PURPOSE: In 1980, operative mortality for esophageal resection was 29%. Over the last 15 years, technical and critical care improvements contributed to the reduction of postoperative mortality rate to 8%. The aim of this study is to analyze retrospectively the role of different factors (surgical procedure, stage of the disease, and anesthetic risk) on the postoperative mortality of 63 patients that underwent esophagectomy with gastric interposition for cancer. METHODS: Seventy-two patients underwent esophagectomy. The stomach was the esophageal substitute in 63 cases. Surgical procedures included transthoracic esophagectomy in 49 patients and transhiatal esophagectomy in 14 cases. Among the 49 transthoracic esophagectomy patients, there were 18 patients with a high anesthetic risk (ASA III). Among the patients that underwent transhiatal esophagectomy, there were 10 patients with a high anesthetic risk (ASA III). RESULTS: The operative mortality rate was 14% (2/14) in transhiatal esophagectomy group and 22% (11/49) in transthoracic esophagectomy group (P = ns). The postoperative mortality of patients with a high anesthetic risk (ASA III) was 47% (8/17) after transthoracic esophagectomy and 10% (1/10) after transhiatal esophagectomy (

    Automated vehicles and the rethinking of mobility and cities

    Get PDF
    The project CityMobil2 has carried out a forward-looking exercise to investigate a lternative cybermobility scenarios, including both niche and large-market innovations, and their impacts on European cities and their transport systems. The paper describes the current status of and main trends in automated vehicles, a preliminary vision of the future city with mobility supported mainly by automated vehicles, and freight distribution. The expected positive impacts derive from the development of car sharing, the reduction of space required for parking vehicles, the possibilities for older people or those with disabilities to use cars, the enhancement of safety, and the improvement of efficiency of the transport system

    On Parallelizing Matrix Multiplication by the Column-Row Method

    Full text link
    We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.
    • 

    corecore